Perceptual audio features for emotion detection
نویسندگان
چکیده
In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for “all” and “valence” tasks.
منابع مشابه
A Comparison of Perceptual Ratings and Computed Audio Features
The backbone of most music information retrieval systems is the features extracted from audio. There is an abundance of features suggested in previous studies ranging from low-level spectral properties to high-level semantic descriptions. These features often attempt to model different perceptual aspects. However, few studies have verified if the extracted features correspond to the assumed per...
متن کاملUsing listener-based perceptual features as intermediate representations in music information retrieval.
The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. T...
متن کاملFiction database for emotion detection in abnormal situations
The present research focuses on the acquisition and annotation of vocal resources for emotion detection. We are interested in detecting emotions occurring in abnormal situations and particularly in detecting ”fear”. The present study considers a preliminary database of audiovisual sequences extracted from movie fictions. The sequences selected provide various manifestations of target emotions a...
متن کاملEfficiency of chosen speech descriptors in relation to emotion recognition
This research paper presents parametrization of emotional speech using a pool of common features utilized in emotion recognition such as fundamental frequency, formants, energy,MFCC, PLP, and LPC coefficients. The pool is additionally expanded by perceptual coefficients such as BFCC, HFCC, RPLP, and RASTA PLP, which are used in speech recognition, but not applied in emotion detection. The main ...
متن کاملUsing perceptually defined music features in music information retrieval
In this study, the notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, in order to understand the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the mu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Audio, Speech and Music Processing
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012